ocr: (a) N-1 Util(s, a) = EReward (SA a) + max f" ZReward,. - policies (b) 8 Util(s, a) EReward(sp at max policies j1 LYR Reward,) Figure 3: (a) Utility (over a finite agent lifetime), defined as tbe expected sum of the immediate reward and the long-term reward under the best possible policy. St is. tbe state at time step 5 Reward(sua) is the immediate reward ofexecuting action a in states St N is tbe mimber ofsteps in the lijetime ofthe agent, and Reward, is the reward at time stept. The operator Ell stands. for taking an expectation over all sources ofrandommess int tbe. system; (b) astility (ove ...